Purpose

QC metrics has been used to justify the pipelien quality. Here we propose an analysis to evaluate the pipeline updates/changes by using Picard QC metrics.

Input data

The key input data for this test are two Picardtools QC metrics from two pipelines

First load functions we will use in this test

Here is the list of input parameters

Parse QC Metrics Files

The QC metrics files are the collection of Picardtools QC outputs, which are formated as N*M matrix The rows are QC metrics and columns are cells/samples.

Here is the list of metrics we will analysis with

##  [1] "detected_ratio"               "MT_ratio"                    
##  [3] "PCT_PF_READS_ALIGNED"         "PCT_PF_READS_IMPROPER_PAIRS" 
##  [5] "PCT_READS_ALIGNED_IN_PAIRS"   "PCT_CODING_BASES"            
##  [7] "PCT_INTERGENIC_BASES"         "PCT_INTRONIC_BASES"          
##  [9] "PCT_UTR_BASES"                "PCT_USABLE_BASES"            
## [11] "PCT_MRNA_BASES"               "PCT_RIBOSOMAL_BASES"         
## [13] "PERCENT_DUPLICATION"          "MEDIAN_5PRIME_TO_3PRIME_BIAS"
## [15] "MEDIAN_3PRIME_BIAS"           "MEDIAN_5PRIME_BIAS"          
## [17] "MEDIAN_CV_COVERAGE"           "PF_MISMATCH_RATE"            
## [19] "MEDIAN_INSERT_SIZE"

First we match column names(cell IDs) between two pipelines metrics

Then select the subset of QC metrics to run analysis.

Finally, we have two QC metrics to run analysis on.

Analysis

To evaluate the pipeline changes/updates, we will carry out statistical tests on each QC metric and these tests are described and visualized as following:

Let’s run the analysis mentioned above on the set of QC metrics.

## detected_ratio A: Density plot of detected_ratio  from two pipeline. B: Violin of  detected_ratio , labeled with Wilcoxon test result. C: Linear regression test of  detected_ratio between two pipelines. D: K-S test, D statistics is labeled

## MT_ratio A: Density plot of MT_ratio  from two pipeline. B: Violin of  MT_ratio , labeled with Wilcoxon test result. C: Linear regression test of  MT_ratio between two pipelines. D: K-S test, D statistics is labeled

## PCT_PF_READS_ALIGNED A: Density plot of PCT_PF_READS_ALIGNED  from two pipeline. B: Violin of  PCT_PF_READS_ALIGNED , labeled with Wilcoxon test result. C: Linear regression test of  PCT_PF_READS_ALIGNED between two pipelines. D: K-S test, D statistics is labeled

## PCT_PF_READS_IMPROPER_PAIRS A: Density plot of PCT_PF_READS_IMPROPER_PAIRS  from two pipeline. B: Violin of  PCT_PF_READS_IMPROPER_PAIRS , labeled with Wilcoxon test result. C: Linear regression test of  PCT_PF_READS_IMPROPER_PAIRS between two pipelines. D: K-S test, D statistics is labeled

## PCT_READS_ALIGNED_IN_PAIRS A: Density plot of PCT_READS_ALIGNED_IN_PAIRS  from two pipeline. B: Violin of  PCT_READS_ALIGNED_IN_PAIRS , labeled with Wilcoxon test result. C: Linear regression test of  PCT_READS_ALIGNED_IN_PAIRS between two pipelines. D: K-S test, D statistics is labeled

## PCT_CODING_BASES A: Density plot of PCT_CODING_BASES  from two pipeline. B: Violin of  PCT_CODING_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_CODING_BASES between two pipelines. D: K-S test, D statistics is labeled

## PCT_INTERGENIC_BASES A: Density plot of PCT_INTERGENIC_BASES  from two pipeline. B: Violin of  PCT_INTERGENIC_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_INTERGENIC_BASES between two pipelines. D: K-S test, D statistics is labeled

## PCT_INTRONIC_BASES A: Density plot of PCT_INTRONIC_BASES  from two pipeline. B: Violin of  PCT_INTRONIC_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_INTRONIC_BASES between two pipelines. D: K-S test, D statistics is labeled

## PCT_UTR_BASES A: Density plot of PCT_UTR_BASES  from two pipeline. B: Violin of  PCT_UTR_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_UTR_BASES between two pipelines. D: K-S test, D statistics is labeled

## PCT_USABLE_BASES A: Density plot of PCT_USABLE_BASES  from two pipeline. B: Violin of  PCT_USABLE_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_USABLE_BASES between two pipelines. D: K-S test, D statistics is labeled

## PCT_MRNA_BASES A: Density plot of PCT_MRNA_BASES  from two pipeline. B: Violin of  PCT_MRNA_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_MRNA_BASES between two pipelines. D: K-S test, D statistics is labeled

## PCT_RIBOSOMAL_BASES A: Density plot of PCT_RIBOSOMAL_BASES  from two pipeline. B: Violin of  PCT_RIBOSOMAL_BASES , labeled with Wilcoxon test result. C: Linear regression test of  PCT_RIBOSOMAL_BASES between two pipelines. D: K-S test, D statistics is labeled

## PERCENT_DUPLICATION A: Density plot of PERCENT_DUPLICATION  from two pipeline. B: Violin of  PERCENT_DUPLICATION , labeled with Wilcoxon test result. C: Linear regression test of  PERCENT_DUPLICATION between two pipelines. D: K-S test, D statistics is labeled

## MEDIAN_5PRIME_TO_3PRIME_BIAS A: Density plot of MEDIAN_5PRIME_TO_3PRIME_BIAS  from two pipeline. B: Violin of  MEDIAN_5PRIME_TO_3PRIME_BIAS , labeled with Wilcoxon test result. C: Linear regression test of  MEDIAN_5PRIME_TO_3PRIME_BIAS between two pipelines. D: K-S test, D statistics is labeled

## MEDIAN_3PRIME_BIAS A: Density plot of MEDIAN_3PRIME_BIAS  from two pipeline. B: Violin of  MEDIAN_3PRIME_BIAS , labeled with Wilcoxon test result. C: Linear regression test of  MEDIAN_3PRIME_BIAS between two pipelines. D: K-S test, D statistics is labeled

## MEDIAN_5PRIME_BIAS A: Density plot of MEDIAN_5PRIME_BIAS  from two pipeline. B: Violin of  MEDIAN_5PRIME_BIAS , labeled with Wilcoxon test result. C: Linear regression test of  MEDIAN_5PRIME_BIAS between two pipelines. D: K-S test, D statistics is labeled

## MEDIAN_CV_COVERAGE A: Density plot of MEDIAN_CV_COVERAGE  from two pipeline. B: Violin of  MEDIAN_CV_COVERAGE , labeled with Wilcoxon test result. C: Linear regression test of  MEDIAN_CV_COVERAGE between two pipelines. D: K-S test, D statistics is labeled

## PF_MISMATCH_RATE A: Density plot of PF_MISMATCH_RATE  from two pipeline. B: Violin of  PF_MISMATCH_RATE , labeled with Wilcoxon test result. C: Linear regression test of  PF_MISMATCH_RATE between two pipelines. D: K-S test, D statistics is labeled

## MEDIAN_INSERT_SIZE A: Density plot of MEDIAN_INSERT_SIZE  from two pipeline. B: Violin of  MEDIAN_INSERT_SIZE , labeled with Wilcoxon test result. C: Linear regression test of  MEDIAN_INSERT_SIZE between two pipelines. D: K-S test, D statistics is labeled

Outputs